Genetic Programming for Automatically Constructing Data Mining Algorithms
نویسندگان
چکیده
At present there is a wide range of data mining algorithms available to researchers and practitioners (Witten & Frank, 2005; Tan et al., 2006). Despite the great diversity of these algorithms, virtually all of them share one feature: they have been manually designed. As a result, current data mining algorithms in general incorporate human biases and preconceptions in their designs. This article proposes an alternative approach to the design of data mining algorithms, namely the automatic creation of data mining algorithms by means of Genetic Programming (GP) (Pappa & Freitas, 2006). In essence, GP is a type of Evolutionary Algorithm – i.e., a search algorithm inspired by the Darwinian process of natural selection – that evolves computer programs or executable structures. This approach opens new avenues for research, providing the means to design novel data mining algorithms that are less limited by human biases and preconceptions, and so offer the potential to discover new kinds of patterns (or knowledge) to the user. It also offers an interesting opportunity for the automatic creation of data mining algorithms tailored to the data being mined.
منابع مشابه
Discovering New Rule Induction Algorithms with Grammar-based Genetic Programming
Rule induction is a data mining technique used to extract classification rules of the form IF (conditions) THEN (predicted class) from data. The majority of the rule induction algorithms found in the literature follow the sequential covering strategy, which essentially induces one rule at a time until (almost) all the training data is covered by the induced rule set. This strategy describes a b...
متن کاملData Mining in R using Rattle
This paper is a brief introduction to the concepts, methods and algorithms for data mining in statistical software R using a package named Rattle. Rattle provides a good graphical environment to perform some of the procedures and algorithms without the need for programming. Some parts of the package will be explained by a number of examples. ...
متن کاملAutomatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining
Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...
متن کاملMiningZinc: A declarative framework for constraint-based mining
We introduce MiningZinc, a declarative framework for constraint-based data mining. MiningZinc consists of two key components: a language component and an execution mechanism. First, the MiningZinc language allows for high-level and natural modeling of mining problems, so that MiningZinc models are similar to the mathematical definitions used in the literature. It is inspired by the Zinc family ...
متن کاملA Novel Experimental Analysis of the Minimum Cost Flow Problem
In the GA approach the parameters that influence its performance include population size, crossover rate and mutation rate. Genetic algorithms are suitable for traversing large search spaces since they can do this relatively fast and because the mutation operator diverts the method away from local optima, which will tend to become more common as the search space increases in size. GA’s are base...
متن کامل